Project 2#

Francesca Michielli (fcm2118)

November 22, 2024

This project looks at NYC shootings rate (defined as number of shootings per 1,000 residents) by police precinct from 2018 - 2023

  • Data on shootings can be found here

  • Population data can be found here

  • Police precinct boundaries can be found here

1. Import and Clean Shootings Data#

import plotly.io as pio
pio.renderers.default = "vscode+jupyterlab+notebook_connected"
import pandas as pd
import plotly.express as px

The shootings data below is a list of every shooting incident in NYC from 2006 - 2023. Each observation has an incident key, occurrence date, location (including precinct), and other details about the incident.

  • Shootings data from here

  • Downloaded 11/14/24

shootings = pd.read_csv("NYPD_Shooting_Incident_Data.csv")

The following commands allow us to preview the data:

shootings.head()
INCIDENT_KEY OCCUR_DATE OCCUR_TIME BORO LOC_OF_OCCUR_DESC PRECINCT JURISDICTION_CODE LOC_CLASSFCTN_DESC LOCATION_DESC STATISTICAL_MURDER_FLAG ... PERP_SEX PERP_RACE VIC_AGE_GROUP VIC_SEX VIC_RACE X_COORD_CD Y_COORD_CD Latitude Longitude Lon_Lat
0 244608249 05/05/2022 00:10:00 MANHATTAN INSIDE 14 0.0 COMMERCIAL VIDEO STORE True ... M BLACK 25-44 M BLACK 986050.000 214231.000000 40.754692 -73.993500 POINT (-73.9935 40.754692)
1 247542571 07/04/2022 22:20:00 BRONX OUTSIDE 48 0.0 STREET (null) True ... (null) (null) 18-24 M BLACK 1016802.000 250581.000000 40.854402 -73.882330 POINT (-73.88233 40.854402)
2 84967535 05/27/2012 19:35:00 QUEENS NaN 103 0.0 NaN NaN False ... NaN NaN 18-24 M BLACK 1048632.000 198262.000000 40.710634 -73.767773 POINT (-73.76777349199995 40.71063412500007)
3 202853370 09/24/2019 21:00:00 BRONX NaN 42 0.0 NaN NaN False ... M UNKNOWN 25-44 M BLACK 1014493.000 242565.000000 40.832417 -73.890714 POINT (-73.89071440599997 40.832416753000075)
4 27078636 02/25/2007 21:00:00 BROOKLYN NaN 83 0.0 NaN NaN False ... M BLACK 25-44 M BLACK 1009149.375 190104.703125 40.688443 -73.910219 POINT (-73.91021857399994 40.68844345900004)

5 rows × 21 columns

shootings.count()
INCIDENT_KEY               28562
OCCUR_DATE                 28562
OCCUR_TIME                 28562
BORO                       28562
LOC_OF_OCCUR_DESC           2966
PRECINCT                   28562
JURISDICTION_CODE          28560
LOC_CLASSFCTN_DESC          2966
LOCATION_DESC              13585
STATISTICAL_MURDER_FLAG    28562
PERP_AGE_GROUP             19218
PERP_SEX                   19252
PERP_RACE                  19252
VIC_AGE_GROUP              28562
VIC_SEX                    28562
VIC_RACE                   28562
X_COORD_CD                 28562
Y_COORD_CD                 28562
Latitude                   28503
Longitude                  28503
Lon_Lat                    28503
dtype: int64
shootings.dtypes
INCIDENT_KEY                 int64
OCCUR_DATE                  object
OCCUR_TIME                  object
BORO                        object
LOC_OF_OCCUR_DESC           object
PRECINCT                     int64
JURISDICTION_CODE          float64
LOC_CLASSFCTN_DESC          object
LOCATION_DESC               object
STATISTICAL_MURDER_FLAG       bool
PERP_AGE_GROUP              object
PERP_SEX                    object
PERP_RACE                   object
VIC_AGE_GROUP               object
VIC_SEX                     object
VIC_RACE                    object
X_COORD_CD                 float64
Y_COORD_CD                 float64
Latitude                   float64
Longitude                  float64
Lon_Lat                     object
dtype: object

The following code creates a column that represents the year of the incident:

shootings['Year'] = pd.DatetimeIndex(shootings['OCCUR_DATE']).year
shootings.head()
INCIDENT_KEY OCCUR_DATE OCCUR_TIME BORO LOC_OF_OCCUR_DESC PRECINCT JURISDICTION_CODE LOC_CLASSFCTN_DESC LOCATION_DESC STATISTICAL_MURDER_FLAG ... PERP_RACE VIC_AGE_GROUP VIC_SEX VIC_RACE X_COORD_CD Y_COORD_CD Latitude Longitude Lon_Lat Year
0 244608249 05/05/2022 00:10:00 MANHATTAN INSIDE 14 0.0 COMMERCIAL VIDEO STORE True ... BLACK 25-44 M BLACK 986050.000 214231.000000 40.754692 -73.993500 POINT (-73.9935 40.754692) 2022
1 247542571 07/04/2022 22:20:00 BRONX OUTSIDE 48 0.0 STREET (null) True ... (null) 18-24 M BLACK 1016802.000 250581.000000 40.854402 -73.882330 POINT (-73.88233 40.854402) 2022
2 84967535 05/27/2012 19:35:00 QUEENS NaN 103 0.0 NaN NaN False ... NaN 18-24 M BLACK 1048632.000 198262.000000 40.710634 -73.767773 POINT (-73.76777349199995 40.71063412500007) 2012
3 202853370 09/24/2019 21:00:00 BRONX NaN 42 0.0 NaN NaN False ... UNKNOWN 25-44 M BLACK 1014493.000 242565.000000 40.832417 -73.890714 POINT (-73.89071440599997 40.832416753000075) 2019
4 27078636 02/25/2007 21:00:00 BROOKLYN NaN 83 0.0 NaN NaN False ... BLACK 25-44 M BLACK 1009149.375 190104.703125 40.688443 -73.910219 POINT (-73.91021857399994 40.68844345900004) 2007

5 rows × 22 columns

Next I create a new dataframe that groups the shootings by precinct and year.

  • This gives a count of the number of shootings in each precinct in each year

shootings_grouped = shootings.groupby(["PRECINCT","Year"])["INCIDENT_KEY"].count().reset_index()
shootings_grouped.head()
PRECINCT Year INCIDENT_KEY
0 1 2007 1
1 1 2008 1
2 1 2009 1
3 1 2010 8
4 1 2012 2

Rename the column to “Shooting Count”

shootings_grouped = shootings_grouped.rename(columns={'INCIDENT_KEY':'Shooting Count'})
shootings_grouped.head()
PRECINCT Year Shooting Count
0 1 2007 1
1 1 2008 1
2 1 2009 1
3 1 2010 8
4 1 2012 2

Upon investigation, it appears that there are some precinct-year combinations that are not in the dataset. This is because there are some precincts that have zero shootings in a given year. To fill out the dataset with the missing values, I created a new dataframe of every year-precinct combination and merged the current dataframe to that new dataframe.

Note that I found this issue after mapping the data and seeing blank precincts on the map. I looked at the data and found that not all precincts had reported shootings in every year. Thus, these precint-year observations do not show up in the collapsed data.

First, get a list of all years and precincts:

years = shootings_grouped["Year"].unique()
years.sort()
precincts = shootings_grouped["PRECINCT"].unique()
precincts.sort()

Next, create a list of every year-precinct combination:

combinations = []
for i in years:
    for n in precincts:
        combinations.append((n,i))

combinations[:10]
[(1, 2006),
 (5, 2006),
 (6, 2006),
 (7, 2006),
 (9, 2006),
 (10, 2006),
 (13, 2006),
 (14, 2006),
 (17, 2006),
 (18, 2006)]

Then, turn this list into a dataframe:

all_observations = pd.DataFrame(combinations, columns=['Precinct', 'Year'])
all_observations.head()
Precinct Year
0 1 2006
1 5 2006
2 6 2006
3 7 2006
4 9 2006

Merge the original dataframe onto this new dataframe with the complete list of observations:

all_observations_comb = pd.merge(
    all_observations, 
    shootings_grouped, 
    left_on=["Precinct", "Year"],
    right_on=["PRECINCT",	"Year"],
    how = "left")

all_observations_comb.head()
Precinct Year PRECINCT Shooting Count
0 1 2006 NaN NaN
1 5 2006 5.0 3.0
2 6 2006 6.0 1.0
3 7 2006 7.0 8.0
4 9 2006 9.0 3.0
all_observations_comb = all_observations_comb.drop("PRECINCT", axis=1)

Replace all the shooting counts that have “NaN” with 0:

all_observations_comb["Shooting Count"] = all_observations_comb["Shooting Count"].fillna(0).astype(int)

all_observations_comb
Precinct Year Shooting Count
0 1 2006 0
1 5 2006 3
2 6 2006 1
3 7 2006 8
4 9 2006 3
... ... ... ...
1381 115 2023 6
1382 120 2023 25
1383 121 2023 2
1384 122 2023 2
1385 123 2023 2

1386 rows × 3 columns

2. Import and Clean Population Data#

The data below shows population by police precint:

  • Data found here

  • Information defining column headers found here

link = "nyc_precinct_2020pop.csv"
population = pd.read_csv(link)
population.head()
precinct P1_001N P1_002N P1_003N P1_004N P1_005N P1_006N P1_007N P1_008N P1_009N ... P2_064N P2_065N P2_066N P2_067N P2_068N P2_069N P2_070N P2_071N P2_072N P2_073N
0 1 84799 76937 57501 2655 96 14624 55 2006 7862 ... 1 0 0 0 0 0 0 0 0 0
1 5 50598 47663 13363 3556 109 27507 69 3059 2935 ... 0 1 0 1 0 0 0 0 0 0
2 6 64643 59481 50320 1782 72 5635 34 1638 5162 ... 0 0 0 0 0 0 0 0 0 0
3 7 57985 51746 18416 6934 396 17779 46 8175 6239 ... 0 0 0 0 0 0 0 0 0 0
4 9 75951 68112 41843 6610 480 11322 84 7773 7839 ... 0 3 3 0 0 0 0 0 0 0

5 rows × 145 columns

Clean the data by renaming the appropriate column to ‘Total Population’ and creating a new dataframe with only the two relevant columns:

population = population.rename(columns={'P1_001N':'Total Population'})

pop = population[['precinct','Total Population']]
pop
precinct Total Population
0 1 84799
1 5 50598
2 6 64643
3 7 57985
4 9 75951
... ... ...
72 115 179134
73 120 122308
74 121 128149
75 122 144552
76 123 100738

77 rows × 2 columns

3. Merge the Shootings Data with Population Data#

  • Merge the two datasets on precinct

  • Calculate the shooting rate per precinct

merged = pd.merge(all_observations_comb, 
         pop, 
         left_on=["Precinct"],
         right_on=["precinct"])

merged
Precinct Year Shooting Count precinct Total Population
0 1 2006 0 1 84799
1 5 2006 3 5 50598
2 6 2006 1 6 64643
3 7 2006 8 7 57985
4 9 2006 3 9 75951
... ... ... ... ... ...
1381 115 2023 6 115 179134
1382 120 2023 25 120 122308
1383 121 2023 2 121 128149
1384 122 2023 2 122 144552
1385 123 2023 2 123 100738

1386 rows × 5 columns

Calculate the shooting rate per precinct per year (shootings per 1000 residents)

merged['shooting_rate'] = merged['Shooting Count']/merged['Total Population']*1000
merged
Precinct Year Shooting Count precinct Total Population shooting_rate
0 1 2006 0 1 84799 0.000000
1 5 2006 3 5 50598 0.059291
2 6 2006 1 6 64643 0.015470
3 7 2006 8 7 57985 0.137967
4 9 2006 3 9 75951 0.039499
... ... ... ... ... ... ...
1381 115 2023 6 115 179134 0.033494
1382 120 2023 25 120 122308 0.204402
1383 121 2023 2 121 128149 0.015607
1384 122 2023 2 122 144552 0.013836
1385 123 2023 2 123 100738 0.019853

1386 rows × 6 columns

4. Import Boundaries for Police Precincts#

Police Precinct boundaries in geojson form found here

import requests

response = requests.get("https://data.cityofnewyork.us/resource/kmub-vria.geojson")
shapes = response.json()
print("loaded")
loaded
shapes["features"][0]["properties"]
{'precinct': '1', 'shape_leng': '80093.5344285', 'shape_area': '47285006.2592'}

5. Map Shootings per 1,000 residents by Precinct#

Create a function to map the shootings rate by precinct for a given year

def plot_nyc(df, year):
    """ This function makes a chloropleth map of NYC, 
    using a DataFrame with a precinct and a shootings_per_capita column"""
    fig = px.choropleth_map(
        df,
        locations="precinct",  
        color="shooting_rate",  
        geojson=shapes,
        featureidkey="properties.precinct",  
        center={"lat": 40.71, "lon": -73.98},
        zoom=9,
        height=600,
        title= (f"Shootings per 1,000 Residents Across Precinct: {year}"),
        range_color=(0, 1),
        labels = {"shooting_rate":"Shootings per 1,000"}
    )

    fig.show()

plot_nyc(merged[merged["Year"]==2020], 2020)

Exclude central park because it is an outlier:

plot_nyc(merged[(merged["precinct"] != 22) 
         & (merged["Year"]==2020)], 2020)

Plot maps for years 2018 - 2023

  • This allows us to visualize the change in shooting rates per precinct over time

for i in [2018, 2020, 2023]:
    plot_nyc(merged[(merged["precinct"] != 22) 
             & (merged["Year"]==i)], i)

Takeaways#

  • These maps show how the intensity of shootings, measured as the number of shootings per 1,000 in each police precinct, changed over time. As can be seen in the maps, the blue and purple districts have relatively lower shooting rates, while the yellow and orange districts have higher rates.

  • These maps indicate that the intensity of shootings increased in 2020 and 2021 compared to 2018 and 2019, then fell in 2023.

  • Shootings are primarily concentrated in parts of Brooklyn, the Bronx, and Upper Manhattan, but didn’t impact each area at the same time. In other words, 2020 was worse for Brooklyn, while 2021 and 2022 was worse for the South Bronx and upper Manhattan.

  • These maps highlight a link between the onset of the Covid pandemic and gun violence in New York City.

  • There are many more ideas to be explored here. For instance, this analysis does not indicate the severity of shootings, nor does it tell us about the different racial or economic groups who are impacted by shootings.